Another Hierarchical Topic Model
نویسنده
چکیده
We describe a hierarchical topic model. We assume that there are various levels of specificity in a document collection. For example, a collection of mailing list posts might be organized according to sentence, paragraph, post and thread. We describe a model that captures the structure at each level of the hierarchy. We use a trace norm penalty on a matrix composed of natural parameters for the multinomial model. 1 The Basic Model We consider a probabilitistic model of text. We assume that a set of documents is generated in two stages. First, a set of document models are generated according to a prior model. Then, words for each document are generated according to that document’s model. We assume that a document’s term frequencies are generated independently of other documents’, when conditioning on the document’s model. We use a trace norm to penalize document models’ divergence from the prior model, effectively placing a Gaussian prior on the singular vectors of the matrix composed of stacked document model parameter vectors. We use the rest of this section to describe the model in detail. Let φ be the multinomial natural parameter vector for the “prior” model; φ represents somewhat of a “center” from which the individual document models emanate. Each document has its own multinomial model, with a natural parameter vector, θi. We define the prior on the document models as a characterization of where the document models are located with respect to the prior
منابع مشابه
Traffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملTopic Model Stability for Hierarchical Summarization
We envisioned responsive generic hierarchical text summarization with summaries organized by topic and paragraph based on hierarchical structure topic models. But we had to be sure that topic models were stable for the sampled corpora. To that end we developed a methodology for aligning multiple hierarchical structure topic models run over the same corpus under similar conditions, calculating a...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملSSHLDA: A Semi-Supervised Hierarchical Topic Model
Supervised hierarchical topic modeling and unsupervised hierarchical topic modeling are usually used to obtain hierarchical topics, such as hLLDA and hLDA. Supervised hierarchical topic modeling makes heavy use of the information from observed hierarchical labels, but cannot explore new topics; while unsupervised hierarchical topic modeling is able to detect automatically new topics in the data...
متن کاملAuthor Disambiguation: A Nonparametric Topic and Co-authorship Model
A fully generative model is provided for the problem of author disambiguation. This approach infers the topics for each author and combines that with co-author information. The problems involved are similar to other entity resolution problems where differing references may refer to one author entity and identical references may refer to different author entities. We extend the hierarchical Diri...
متن کاملHierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration
Most Web page classification models typically apply the bag of words (BOW) model to represent the feature space. The original BOW representation, however, is unable to recognize semantic relationships between terms. One possible solution is to apply the topic model approach based on the Latent Dirichlet Allocation algorithm to cluster the term features into a set of latent topics. Terms assigne...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005